Constraint driven schema merging
نویسنده
چکیده
Schema integration is the process of consolidating several source schemas to generate a unified view, called the mediated schema, so that information scattered in the sources can be served uniformly from the mediated schema. Schema integration occurs in many scenarios such as data integration, logical database design, data warehousing and schema evolution. To make the mediated schema useful for data interoperability tasks, mappings between the source schemas and the mediated schema have to be derived. Previous approaches fall short in two aspects. First, the identification of inter-schema relationships (i.e., schema matching) is usually mixed with the process of combining and restructuring schemas (i.e., schema merging). The coupling of schema matching and schema merging results in increased complexities and human interventions in the schema integration process. Second, the schema mappings are either conceptual alignments between entity types or syntactical correspondences between attributes. Neither of the two mapping languages is able to express complex relationships among several modeling constructs. Logical schema mappings in the form of data dependencies are able express such complex relationships but are less explored for schema merging. In this thesis, we propose a new approach to schema merging using logical schema mappings, more specifically tuple-generating dependencies(tgds) and equality-generating dependencies (egds). We provide well founded semantics of schema merging under two scenarios: view integration and data integration. Based on the formal characterization of the schema merging problem, we develop a schema minimization approach which generates minimal mediated schemas with the same query answering capacity as the source schemas. We study the complexity of the proposed algorithms and show that the schema minimization problems are intractable in the general case. However, we have identified syntactical constraints on the input mappings which ensure that the proposed algorithms are in PTIME. In addition, we have implemented the schema merging algorithms in a prototype. The evaluation on real world and synthetic data sets shows the applicability and scalability of the approach.
منابع مشابه
Query Propagation in a P2P Data Integration System in the Presence of Schema Constraints
This paper addresses the problem of data integration in a P2P environment, where each peer stores schema of its local data, mappings between the schemas, and some schema constraints. The goal of the integration is to answer queries formulated against a chosen peer. The answer consists of data stored in the queried peer as well as data of its direct and indirect partners. We focus on defining an...
متن کاملLabelling Business Entities in a Canonical Data Model
Enterprises express the concepts of their electronic business-to-business (B2B) communication in individual ontology-like schemas. Collaborations require merging schemas’ common concepts into Business Entities (BEs) in a Canonical Data Model (CDM). Although consistent, automatic schema merging is state of the art, the task of labeling the BEs with descriptive, yet short and unique names, remain...
متن کاملGeneric Schema Merging
Schema merging is the process of integrating several schemas into a common, unified schema. There have been various approaches to schema merging, focusing on particular modeling languages, or using a lightweight, abstract metamodel. Having a semantically rich representation of models and mappings is particularly important for merging as semantic information is required to resolve the conflicts ...
متن کاملMerging Models Based on Given Correspondences
A model is a formal description of a complex application artifact, such as a database schema, an application interface, a UML model, an ontology, or a message format. The problem of merging such models lies at the core of many meta data applications, such as view integration, mediated schema creation for data integration, and ontology merging. This paper examines the problem of merging two mode...
متن کاملSchema Matching and Schema Merging based on Uncertain Semantic Mappings
This dissertation lies in the research area of schema integration: the problem of combining the data of different data sources by creating a unified representation of these data. Two core issues in schema integration are schema matching, i.e. the identification of correspondences, or mappings, between input schema objects, and schema merging, i.e. the creation of a unified schema based on the i...
متن کامل